Controlled Flux Results in Stable Decision Trees
نویسندگان
چکیده
This work deals with stability in incremental induction of decision trees. Stability problems arise when an induction algorithm must revise a decision tree very often and tree oscillations between similar concepts decrease learning speed. We introduce a heuristic to tackle this problem and an algorithm that uses this heuristic, and we provide theoretical and experimental backing to justify it. Subject Areas: AI Algorithms, Machine Learning. Controlled flux results in stable decision trees -1Introduction Incremental machine learning systems aim to show an adaptive behavior by responding to changing environmental factors. A special case of a changing environment occurs when we need to revise often the target concept as new training instances arrive. The decision to drop obsolete information is a complex one. It depends on diverse factors, such as the speed of learning or the dynamics of the acquired concept (how often we really need to update knowledge and how difficult that task can be). Being able to efficiently maintain valid concepts is what ideally characterizes a successful incremental learning system. Efficiency refers not only to the quality and usability of the acquired knowledge but also to how easily this knowledge can be kept up-to-date. This work discusses efficiency issues in the domain of decision trees. A core issue has always been that the unpredictability of the order in which instances of the training set are presented may trigger substantially different paths to follow in the state-space of decision trees. Each step in such a path is a restructuring operation on the current decision tree. During incremental learning, this results in the current concept being continuously modified according to the latest training instances. This lack of stability can severely compromise efficiency. In this paper we propose a heuristic approach that limits successive unwarranted restructuring steps. It consists of a heuristic method based on speculative reasoning with some theoretical backing, an extensive experimentation section, and a discussion of its wide applicability potential. Overview of the problem The paper builds on existing work on the incremental induction of decision trees (Schlimmer and Fisher, 1986; Utgoff, 1989; Utgoff, 1994; Kalles and Morris, 1996). It is based on the ideas first introduced by Utgoff (1989) in his ID5R algorithm and its successor, ITI (Utgoff, 1994). The basic idea, however, of restructuring decision trees for efficiency and integrity purposes cuts across other variants of incremental induction, too (van-de-Velde, 1990). ID4 (Schlimmer and Fisher, 1986) was the first ID3-variant to tackle incremental learning. ID4 builds the same tree as the basic ID3 algorithm, when there is an attribute at each decision node that is clearly the best among its competitors. Whenever the relative ordering of the possible test attributes, at a node, Controlled flux results in stable decision trees -2changes due to new incoming instances, all subtrees below that node are discarded. Sometimes, the relative ordering does not stabilize and the decision tree is being rebuilt from scratch every time a new training instance arrives. Utgoff (1988) proposed ID5 and expanded on this idea by selecting the most suitable attribute for a node and restructuring the tree, so that this attribute is pulled up from the leaves towards that node. To achieve this he used suitable tree manipulations, called transpositions. These are localized operations that reorder attribute tests, ensuring integrity with tests in lower nodes. After a tree is restructured at a node, subtrees below that node may have attributes that are not the most suitable for splitting. The ID5R algorithm proposed the recursive examination of such subtrees. In an incremental learning context the mapping between tree nodes and attribute tests is not necessarily stable, even over a short learning period, when the performance margin between competing attributes is narrow. In such a case it may be that "currently best" attributes are swapped in-and-out of some subtree roots. This means that computing resources are being wasted in switching between similar concepts. ITI inevitably suffers from concept oscillation as it guarantees compatibility with the decision tree that would be produced had all instances been processed in one batch (using ID3) rather than one at a time. On the other hand, this guarantee establishes ITI as a baseline performance algorithm against which one should test any claimed improvement. A step towards avoiding unnecessary computations is to identify nodes where attribute tests will outperform competitors for a sequence of new training patterns (Kalles and Morris, 1996). Another step is to identify the nodes where although we could justify revision, its effects may imply frequent subsequent attribute swaps. To estimate the stability of a revising decision, one must carefully balance the risk of using a tree of (assumed) reduced effectiveness and the pay-off of savings in computations. These savings result by not having to revise past decisions too often. The aim of this paper is to suggest a solution to this problem. To do that we will carefully depart from ITI’s guarantee and show that this departure is affordable. 1 In incremental learning, classification is embedded in incorporating new training instances, hence the term “using the tree” in the context of this paper. Controlled flux results in stable decision trees -3Turney (1995) proposes a different notion of stability. He suggests stability is a desirable property of a learning algorithm if it can induce similar concepts from different training sets, provided that training instances come from the same population. His context is quite different to ours; we study stability-vsefficiency during a single learning episode. The notion of stability, as put forward by Turney, is vary valuable as he also proposes a metric for quantifying it across many learning domains (and not simply decision trees). However, he does not address efficiency issues as a consequence of the problem we described above. Although we acknowledge the common use of the term “stability” between this work and Turney’s, we believe that the two contexts are sufficiently far apart so that no confusion will occur. Description of a heuristic solution By pulling up the best attribute we invest in a good structure for subsequent training instances, as we hope to speed up the training process in two ways. First, we anticipate a smaller tree, which means that leaves are reached earlier on average. Furthermore a good tree will more often allocate new instances to leaves of their class instead of splitting existing leaf nodes to accommodate these instances. As a byproduct smaller trees also require less housekeeping and less effort for later restructuring. By postponing the tree revision we adopt a more conservative strategy that requires an attribute to prove its worth by a comfortable margin before being assigned the role of a splitting test. The question is then: how long can we postpone restructuring and still have a usable tree? In terms of learning speed in using a tree the postponement of restructuring operations amounts to deciding to pay small bills for an unknown time period instead of improving the tree infrastructure. This decision is a short-term one; it may be that after a few instances the old tree is indeed not good and too expensive to keep using. The number of training instances we can afford to ignore is unknown beforehand. It is possible, however, to roughly estimate it as a function of the usage cost of each step and the cost of the restructuring operation. Cost estimation is a difficult process, especially in the case of heuristic approaches. Similar studies focus on worst case analysis (Utgoff, 1989), which is hopelessly pessimistic and therefore unsuitable for the purpose of this paper, or provide efficient implementations based on regularity in the data sets (Kalles and Morris, 1996). Controlled flux results in stable decision trees -4To study a more realistic cost situation, we shall adopt some basic elements from average case analysis. In this analysis we need to make a few assumptions that do not always hold. For this reason we also need a metric that will inform us on the distance of realistic cases from ideal ones. Putting all these together we derive results that are both realistic and with a rational theoretical basis. To decide whether to restructure or not, we need to choose a parameter of the tree and use it to build an optimization criterion. By opting to focus on learning speed, the clear choice is tree size (Fayyad and Irani, 1990). Size seems to be an acceptable metric of tree quality, especially in the light of new results that show how common algorithms may display marginal differences in accuracy with surprisingly big differences in tree size (Oates and Jensen, 1997). To simplify the analysis we choose to study binary decision trees. Hence the sole parameter that determines tree size is the number of leaves. The methodology goes along the following reasoning line: First, we determine the cost of restructuring a decision tree and the cost of using it. We express both costs as functions of the tree size. However, costs of restructuring are incurred by transpositions, whereas usage costs are incurred by tree traversals; we introduce the notion of a basic operation to unify the treatment of these costs. We then try to estimate the number of tree leaves based on the candidate splitting attribute of the tree’s root. At this point, we have established an association between the costs of a decision tree and its root splitting attribute (that attribute’s quality is the only quantity that can be calculated during this phase). The final step is to calculate the number of instances for which the accumulated usage cost will be less than the restructuring cost; this is the number of instances that we can afford to ignore. The first task to solve is the estimation of the restructuring cost for a binary decision tree of n leaves. This consists of two basic costs. First, there is the cost of the recursive pull-up of an attribute. Secondly, we have the cost of restructuring the subtrees (an attribute pull-up may leave the lower tree nodes with attributes of a low discriminating capability and a full scale revision will entail a recursive node reevaluation). We will now describe the computation of the average restructuring cost of binary decision trees of n leaves. Controlled flux results in stable decision trees -5A first step in determining the restructuring cost is to estimate the number of binary decision trees of n leaves, T(n). One can visualize the presence of a test throughout the whole tree by employing a simple coloring scheme. Under this scheme, a node is black if it uses the particular test and white if it does not. A leaf is black if that test has not been used anywhere in the path from the root to that leaf (and, hence, is available at the leaf for further splitting, when so required). The following figure depicts how a family of trees is derived from a given binary tree. Figure 1: An example of the derivation of colored binary decision trees. It could be argued that one should draw all possible decision trees, with all individual node colorings (assigning separate colors to separate tests). Using such an exhaustive enumeration, one could then accommodate in each tree all possible test alternatives for nodes that are otherwise shown in white color. Obviously, the number of trees would then be exponential on the number of leaves. There is an observation that saves us all these computations: a restructuring triggers the pull-up of only one attribute to the root of a tree. This pull-up depends solely on where that attribute lies in the tree; all other attributes are unimportant. If, at a given situation, attribute A is to be pulled up to the root, we need not worry about where attribute B or C or whichever else currently resides at the root. When attribute A reaches the root eventually, we can start tackling lower level tree nodes. At that point, however, we can forget the previous coloring and employ a new one, depending on which attribute is currently under scrutiny. Let us now estimate T(n). Let L(n) be the set of decision trees with n leaves, according to the description above. As the figure below shows, T(0) = 0, T(1)=1, T(2)=2, T(3)=6, T(4)=21. Controlled flux results in stable decision trees
منابع مشابه
Influence of environmental factors on the sap flux density of mango trees under rain-fed cropping systems in West Africa
Xylem sap flux density (Fd) was measured, on a 43-year-old (mature) and three 4-year-old (young) mango (Mangifera indica L.) trees, using Granier-type probes. The relative influences of environmental variables were examined under well-watered condition. Circumferential variation in Fd was also investigated by placing sensors on the north, south-west and south-east sides of the mature tree. Sap ...
متن کاملA New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining
Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...
متن کاملPredicting The Type of Malaria Using Classification and Regression Decision Trees
Predicting The Type of Malaria Using Classification and Regression Decision Trees Maryam Ashoori1 *, Fatemeh Hamzavi2 1School of Technical and Engineering, Higher Educational Complex of Saravan, Saravan, Iran 2School of Agriculture, Higher Educational Complex of Saravan, Saravan, Iran Abstract Background: Malaria is an infectious disease infecting 200 - 300 million people annually. Environme...
متن کاملEstimating Suspended Sediment by Artificial Neural Network (ANN), Decision Trees (DT) and Sediment Rating Curve (SRC) Models (Case study: Lorestan Province, Iran)
The aim of this study was to estimate suspended sediment by the ANN model, DT with CART algorithm and different types of SRC, in ten stations from the Lorestan Province of Iran. The results showed that the accuracy of ANN with Levenberg-Marquardt back propagation algorithm is more than the two other models, especially in high discharges. Comparison of different intervals in models showed that r...
متن کاملPreparation and quality control of 153Sm-[tris(1,10 -phenanthroline) samarium (III)] complex
Background: The 153Sm-[tris(1,10-phenanthroline) Samarium(III)]complex (153Sm-PL3) was prepared in view of development of targeting therapeutic compounds for malignancies, and interesting in-vitro anti-tumor activities of lanthanide phenanthroline complexes,. Materials and Methods: Sm-153 chloride was obtained by thermal neutron flux (4 × 1013 n.cm-2.s-1) of enriched 152Sm2O3 sample, diss...
متن کامل